feat(copilot): MCTS dynamic strategy engine with reward model and rollout simulation by ldemon2333 · Pull Request #13 · AnnaSuSu/TechSpar

ldemon2333 · 2026-04-07T16:45:37Z

Summary

Add an MCTS (Monte Carlo Tree Search) dynamic strategy engine to the interview copilot, enabling real-time strategy optimization during mock interviews.

Changes

New Modules (`backend/copilot/`)

mcts_config.py — MCTSConfig, MCTSNode, StrategyRecommendation data structures
reward_model.py — RewardModel with cosine similarity scoring: R(S) = W1·Match_JD + W2·Safe - W3·Risk
simulation_engine.py — 3-level degradation rollout simulator (LLM → lightweight LLM → pure reward)
mcts_engine.py — Full MCTS 4-step engine (Select/Expand/Simulate/Backprop) with PUCT selection

Modified Files

config.py — 11 new mcts_* settings (feature-flagged, disabled by default)
llm_provider.py — get_mcts_rollout_llm() for simulation
main.py — Integration into copilot WebSocket session as async background task

Frontend

frontend/src/hooks/useCopilotStream.js — Add strategy_recommendation case to WebSocket message switch, ensuring MCTS search results are forwarded to the UI via onUpdate callback (without this the backend pushes the message but the frontend silently drops it)

Bug Fixes

ASR 启动逻辑修复: NLS SDK start() 返回 None 而非 truthy 值，改用 try/except
WebSocket 断开时 MCTS cleanup: finally 块中增加 mcts.stop() 调用，防止搜索 Task 写入已关闭的 WS
候选人回答后不再触发多余 MCTS 搜索: 搜索仅在 HR 发言时触发
_try_merge_static 变量命名: _ → matched_node_id（实际使用不应为 throwaway）
展开深度使用配置值: 新增 max_expansion_depth 替代硬编码 3
embedding 调用不阻塞 event loop: asyncio.to_thread() 包装同步 API

Docs and Tests

docs/mcts-strategy.md — User-facing feature documentation
docs/SUMMARY.md — Updated index
tests/test_mcts_engine.py — 39 unit tests covering all modules

Key Design Decisions

Feature-flagged: MCTS_ENABLED=false by default, zero impact when disabled
PUCT variant: AlphaGo-style selection with LLM confidence as prior, c_puct=1.4
Pure numpy: No heavy ML dependencies, less than 10ms per reward evaluation
Graceful degradation: Falls back to reward-only evaluation if LLM rollout fails

AnnaSuSu

整体设计不错——博弈建模思路清晰，模块拆分干净，feature flag 零侵入，降级策略也考虑到了。以下几个问题需要先修一下：

Bug（必须修）

1. `_get_weak_points` 永远返回空列表

mcts_engine.py:506 从 prep_state.get("profile", {}) 读 weak_points，但 prep_result 里没有 "profile" 这个 key。候选人画像不在 prep state 里。需要改成从 fit_report.get("gaps", []) 读取，或者在 _init_mcts_engine 时把 profile 传进 prep_state。

2. WebSocket 断开时 MCTS 引擎没有 cleanup

main.py 的 finally 块只清理了 ASR，没调 mcts.stop()。断连后搜索 Task 会继续跑然后尝试 ws.send_json() 到已关闭的 WebSocket。需要加上：

finally:
    if session and session.get("asr"):
        session["asr"].shutdown()
    if session and session.get("mcts_engine"):
        await session["mcts_engine"].stop()
    _copilot_sessions.pop(session_id, None)

3. 候选人回答后触发 MCTS 搜索逻辑有问题

main.py 在 on_candidate_response 之后又 create_task(_run_mcts_and_push)，但此时根节点仍然是上一轮 HR 的问题。候选人已经回答了，再在旧根上搜候选人策略没意义。建议：

去掉候选人回答后的 MCTS 触发
或者改成以候选人回答为新根，搜索预测 HR 下一步追问

需要你确认一下这里的设计意图。

建议改进

4. `_try_merge_static` 里 `_` 做变量名但实际在用

_, static_intent, score = self.navigator.match_utterance(...)
static_node = self.navigator.get_node(_)

_ 按惯例是 throwaway，这里实际当 node_id 用，建议改名。

5. 展开深度硬编码

_run_iteration 里 leaf.depth < 3 是硬编码的，config 里有 rollout_depth 但没用上。建议用配置值或单独加个 max_expansion_depth。

6. `get_text_embedding()` 同步调用阻塞 event loop

_expand 和 on_hr_utterance 里直接调 embed.get_text_embedding()，如果用的是 API embedding 会阻塞。建议 asyncio.to_thread() 包一下。

修完 1-3 后再看一轮，其他的不阻塞合入。

ldemon2333 · 2026-04-08T06:35:47Z

幸会 ***@***.*** Bug（必须修） 1. _get_weak_points 永远返回空列表 prep_result = prep_data["result"]，prep_store.set_done(prep_id, result)，result 里的初始化在run_copilot_prep，会完成 profile 的初始化，同时我在测试的时候不管是在初始 load 简历，还是使用历史的用户画像，都没有发现这个问题，没有做任何修改 2. WebSocket 断开时 MCTS 引擎没有 cleanup 已修复，调 mcts.stop() 3. 候选人回答后触发 MCTS 搜索逻辑有问题按照，已修复，去掉候选人回答后的 MCTS 触发或者改成以候选人回答为新根，搜索预测 HR 下一步追问 4. _try_merge_static 里 _ 做变量名但实际在用已修复， 5. 展开深度硬编码已修复 6. get_text_embedding() 同步调用阻塞 event loop 已修复同时我在做实时语音 copilot 的时候，发现两个 issue，已提issue：所有 ASR 音频都硬编码为 HR 角色，没有角色的概念，真实场景下功能还不够完善面试 copilot 功能输入几次 hr 的提问后，前端右边的决策树分析面板会消失，应该是前端的bug 原始邮件发件人：Aari ***@***.***> 发件时间：2026年4月8日 02:35 收件人：AnnaSuSu/TechSpar ***@***.***> 抄送：Ldemon ***@***.***>, Author ***@***.***> 主题：Re: [AnnaSuSu/TechSpar] feat(copilot): MCTS dynamic strategy engine with reward model and rollout simulation (PR #13) @AnnaSuSu requested changes on this pull request. 整体设计不错——博弈建模思路清晰，模块拆分干净，feature flag 零侵入，降级策略也考虑到了。以下几个问题需要先修一下： Bug（必须修） 1. _get_weak_points 永远返回空列表 mcts_engine.py:506 从 prep_state.get("profile", {}) 读 weak_points，但 prep_result 里没有 "profile" 这个 key。候选人画像不在 prep state 里。需要改成从 fit_report.get("gaps", []) 读取，或者在 _init_mcts_engine 时把 profile 传进 prep_state。 2. WebSocket 断开时 MCTS 引擎没有 cleanup main.py 的 finally 块只清理了 ASR，没调 mcts.stop()。断连后搜索 Task 会继续跑然后尝试 ws.send_json() 到已关闭的 WebSocket。需要加上： finally:    if session and session.get("asr"):        session["asr"].shutdown()    if session and session.get("mcts_engine"):        await session["mcts_engine"].stop()    _copilot_sessions.pop(session_id, None) 3. 候选人回答后触发 MCTS 搜索逻辑有问题 main.py 在 on_candidate_response 之后又 create_task(_run_mcts_and_push)，但此时根节点仍然是上一轮 HR 的问题。候选人已经回答了，再在旧根上搜候选人策略没意义。建议：去掉候选人回答后的 MCTS 触发或者改成以候选人回答为新根，搜索预测 HR 下一步追问需要你确认一下这里的设计意图。建议改进 4. _try_merge_static 里 _ 做变量名但实际在用 _, static_intent, score = self.navigator.match_utterance(...) static_node = self.navigator.get_node(_) _ 按惯例是 throwaway，这里实际当 node_id 用，建议改名。 5. 展开深度硬编码 _run_iteration 里 leaf.depth < 3 是硬编码的，config 里有 rollout_depth 但没用上。建议用配置值或单独加个 max_expansion_depth。 6. get_text_embedding() 同步调用阻塞 event loop _expand 和 on_hr_utterance 里直接调 embed.get_text_embedding()，如果用的是 API embedding 会阻塞。建议 asyncio.to_thread() 包一下。修完 1-3 后再看一轮，其他的不阻塞合入。 — Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you authored the thread.

AnnaSuSu · 2026-04-08T07:07:19Z

补一条，第 1 条我收回，是我 review 错了。

刚才重新 trace 了下 prep_result 的构造，copilot_prep.py:207 返回的 dict 里确实有 "profile"（line 211），来自 memory.get_profile(user_id)，里面也是带 weak_points 的（memory.py 那边一直在维护这个字段）。你测试没复现是对的，_get_weak_points 能正常拿到数据，这块不用改。

我之前凭印象说"prep_result 里没有 profile key"，没 trace 到源头，抱歉。

另外提醒一下，你 PR 里后端加了 strategy_recommendation 这个消息类型，但 frontend/src/hooks/useCopilotStream.js 的 switch 里没加对应的 case，前端会默默把这条消息丢掉，面板拿不到 MCTS 的搜索结果。合入前记得带上前端的改动，不然整条链路是不通的。

其他几条修完一起推上来，我再过一遍。

…lout simulation - Add MCTSConfig, MCTSNode, StrategyRecommendation data structures - Add RewardModel with cosine similarity scoring (R = W1·Match + W2·Safe - W3·Risk) - Add SimulationEngine with 3-level degradation rollout - Add MCTSEngine with PUCT selection, LLM expansion, backpropagation - Integrate MCTS into copilot WebSocket session (feature-flagged, off by default) - Add 11 mcts_* settings to config and rollout LLM provider - Add user-facing docs and 39 unit tests

AnnaSuSu · 2026-04-14T09:09:36Z

@ldemon2333 先谢谢你为这个 PR 投入的工作 —— 模块拆分（mcts_config / mcts_engine / reward_model / simulation_engine）很干净，feature flag 默认关闭零侵入，3 级降级策略设计得很合理，39 个单测覆盖也到位。第一轮 Review 提的几个问题你都跟进修了，代码质量本身完全够合入的线。

但斟酌之后，我决定这个 PR 先不合入，理由不在代码质量，而在项目当前阶段：

Copilot 的实时链路目前还处在稳定期。最近几周主链路本身还在持续收敛 —— ASR 启动逻辑、WebSocket 生命周期、候选人/HR 发言的触发语义、Intent Classifier → Answer Coach → Interview Monitor → HR Profiler 这条管线的各层接口都还没完全沉淀下来。此时在上面再叠一层博弈推演模块，意味着每一次对主链路的调整都要同步考虑 MCTS 这边的耦合点（prep_state 结构、StrategyTreeNavigator 接口、WebSocket 消息类型、_init_copilot_session 的初始化顺序等）。当前阶段的维护成本会超过这个特性带来的边际收益。

这是一个 scope 和 timing 的决策，不是对你工作的否定。等 Copilot 核心链路稳定下来、并且我们把"动态策略"这个方向确认进 roadmap 之后，你这个分支里的思路（尤其是 reward_model 的 embedding 打分和 simulation engine 的降级策略）是很好的起点。建议你把分支留在自己的 fork 里，以后重新捡起来也方便。

再次感谢你的贡献，也抱歉让你跑了一轮完整的 review cycle。

ldemon2333 · 2026-04-14T10:03:27Z

收到好的幸会 ***@***.*** 原始邮件发件人：Aari ***@***.***> 发件时间：2026年4月14日 17:09 收件人：AnnaSuSu/TechSpar ***@***.***> 抄送：Ldemon ***@***.***>, Mention ***@***.***> 主题：Re: [AnnaSuSu/TechSpar] feat(copilot): MCTS dynamic strategy engine with reward model and rollout simulation (PR #13) AnnaSuSu left a comment (AnnaSuSu/TechSpar#13) @ldemon2333 先谢谢你为这个 PR 投入的工作 —— 模块拆分（mcts_config / mcts_engine / reward_model / simulation_engine）很干净，feature flag 默认关闭零侵入，3 级降级策略设计得很合理，39 个单测覆盖也到位。第一轮 Review 提的几个问题你都跟进修了，代码质量本身完全够合入的线。但斟酌之后，我决定这个 PR 先不合入，理由不在代码质量，而在项目当前阶段： Copilot 的实时链路目前还处在稳定期。最近几周主链路本身还在持续收敛 —— ASR 启动逻辑、WebSocket 生命周期、候选人/HR 发言的触发语义、Intent Classifier → Answer Coach → Interview Monitor → HR Profiler 这条管线的各层接口都还没完全沉淀下来。此时在上面再叠一层博弈推演模块，意味着每一次对主链路的调整都要同步考虑 MCTS 这边的耦合点（prep_state 结构、StrategyTreeNavigator 接口、WebSocket 消息类型、_init_copilot_session 的初始化顺序等）。当前阶段的维护成本会超过这个特性带来的边际收益。这是一个 scope 和 timing 的决策，不是对你工作的否定。等 Copilot 核心链路稳定下来、并且我们把"动态策略"这个方向确认进 roadmap 之后，你这个分支里的思路（尤其是 reward_model 的 embedding 打分和 simulation engine 的降级策略）是很好的起点。建议你把分支留在自己的 fork 里，以后重新捡起来也方便。再次感谢你的贡献，也抱歉让你跑了一轮完整的 review cycle。 — Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you were mentioned.

AnnaSuSu requested changes Apr 7, 2026

View reviewed changes

ldemon2333 force-pushed the main branch from 21cd084 to 2d14a68 Compare April 8, 2026 06:05

ldemon2333 mentioned this pull request Apr 8, 2026

Copilot 实时辅助：ASR 音频无角色区分，所有语音硬编码为 HR #14

Closed

AnnaSuSu mentioned this pull request Apr 8, 2026

Copilot 前端：多轮提问后右侧策略分析面板内容消失 #15

Closed

ldemon2333 force-pushed the main branch from 2d14a68 to 40b6cc1 Compare April 8, 2026 07:19

AnnaSuSu closed this Apr 14, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(copilot): MCTS dynamic strategy engine with reward model and rollout simulation#13

feat(copilot): MCTS dynamic strategy engine with reward model and rollout simulation#13
ldemon2333 wants to merge 1 commit intoAnnaSuSu:mainfrom
ldemon2333:main

ldemon2333 commented Apr 7, 2026 •

edited

Loading

Uh oh!

AnnaSuSu left a comment

Uh oh!

ldemon2333 commented Apr 8, 2026 via email

Uh oh!

AnnaSuSu commented Apr 8, 2026

Uh oh!

AnnaSuSu commented Apr 14, 2026

Uh oh!

ldemon2333 commented Apr 14, 2026 via email

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

ldemon2333 commented Apr 7, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Changes

New Modules (backend/copilot/)

Modified Files

Frontend

Bug Fixes

Docs and Tests

Key Design Decisions

Uh oh!

AnnaSuSu left a comment

Choose a reason for hiding this comment

Bug（必须修）

1. _get_weak_points 永远返回空列表

2. WebSocket 断开时 MCTS 引擎没有 cleanup

3. 候选人回答后触发 MCTS 搜索逻辑有问题

建议改进

4. _try_merge_static 里 _ 做变量名但实际在用

5. 展开深度硬编码

6. get_text_embedding() 同步调用阻塞 event loop

Uh oh!

ldemon2333 commented Apr 8, 2026 via email

Uh oh!

AnnaSuSu commented Apr 8, 2026

Uh oh!

AnnaSuSu commented Apr 14, 2026

Uh oh!

ldemon2333 commented Apr 14, 2026 via email

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

ldemon2333 commented Apr 7, 2026 •

edited

Loading

New Modules (`backend/copilot/`)

1. `_get_weak_points` 永远返回空列表

4. `_try_merge_static` 里 `_` 做变量名但实际在用

6. `get_text_embedding()` 同步调用阻塞 event loop